Re: [SQL] optimizing 2-table join w/millions of rows
От | Herouth Maoz |
---|---|
Тема | Re: [SQL] optimizing 2-table join w/millions of rows |
Дата | |
Msg-id | l03110700b27f0a10d780@[147.233.159.109] обсуждение исходный текст |
Ответ на | Re: [SQL] optimizing 2-table join w/millions of rows (Michael Olivier <molivier@yahoo.com>) |
Список | pgsql-sql |
At 5:05 +0200 on 23/11/98, Michael Olivier wrote: > > > How about: > > > > SELECT U.acctname > > FROM usertest U, bgndtest B > > WHERE B.acctname = U.acctname > > AND B.part_needed=3 AND B.loc_needed=5 > > AND U.part=2 AND U.loc=3; > > Yes, that looks equivalent. My problem is this is too slow an > operation as I'm benchmarking it right now. And if I add less-than or > greater-than comparisons, the performance goes _way_ down from there. > How can I get the best performance out of this kind of operation? > > Is there any way to force postgres to hold certain tables in memory > all the time? As I said, cost of memory isn't an issue, but > performance is. This is very strange. As a general rule, a simple join should always be faster than a query with a subquery. Especially when that subquery is "in" and not "exists". In principle, if there is an index on the part, loc and acctname (separate indices) in usertest, and on part_needed, loc_needed and acctname in bgndtest, the most optimal query should be: restrict each of the tables by the literal comparisons, and then match the acctname. I don't know who wrote the optimiser, and I'm not sure he is on this list. There are optimizer settings which can be changed, such as SET R_PLANS ON. In any case, if you see in "real life" that a subquery actually works better for you, then avoid the join. No reason to do both: SELECT acctname FROM bgndtest WHERE part_needed=3 and loc_needed=5 AND acctname IN ( SELECT acctname FROM usertest WHERE part=2 AND loc=3 ); Or better yet: SELECT acctname FROM bgndtest B WHERE part_needed=3 and loc_needed=5 AND EXISTS ( SELECT * FROM usertest U WHERE part=2 AND loc=3 AND U.acctname = B.acctname ); Of course, you may do it the other way around. It depends on which of your tables is supposed to have more records, and which of the comparisons is actually a "=" comparison, which is very important, because less-than and greater-than don't use the indices. What I mean by "the other way around" is: SELECT acctname FROM usertest U WHERE part=2 and loc=3 AND EXISTS ( SELECT * FROM bgndtest B WHERE part_needed=3 and loc_needed=5 AND U.acctname = B.acctname ); As for your question - there is no way to keep an entire table in memory, unless you have a ram disk. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
В списке pgsql-sql по дате отправления: